Wine Quality Analysis (Red) by Sumukha K

This dataset is related to red variant of the Portuguese “Vinho Verde” wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].This dataset was created, using red wine samples.The inputs include objective tests (e.g. PH values) and the output is based on sensory data(median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent).

Importing the required libraries

library(dplyr)
library(ggplot2)

Loading the data

# Loading the Data
df_wine<- read.csv("wineQualityReds.csv")
head(df_wine) #Viewing the first few rows of data
##   X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1           7.4             0.70        0.00            1.9     0.076
## 2 2           7.8             0.88        0.00            2.6     0.098
## 3 3           7.8             0.76        0.04            2.3     0.092
## 4 4          11.2             0.28        0.56            1.9     0.075
## 5 5           7.4             0.70        0.00            1.9     0.076
## 6 6           7.4             0.66        0.00            1.8     0.075
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
## 6                  13                   40  0.9978 3.51      0.56     9.4
##   quality
## 1       5
## 2       5
## 3       5
## 4       6
## 5       5
## 6       5
str(df_wine) #Structure of the data set
## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...

Attribute Information:

  1. fixed acidity (tartaric acid - g / dm^3) : most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
  2. volatile acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
  3. citric acid (g / dm^3): found in small quantities, citric acid can add ‘freshness’ and flavor to wines
  4. residual sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
  5. chlorides (sodium chloride - g / dm^3): the amount of salt in the wine
  6. free sulfur dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
  7. total sulfur dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
  8. density (g / cm^3): the density of water is close to that of water depending on the percent alcohol and sugar content
  9. pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
  10. sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
  11. alcohol (% by volume): the percent alcohol content of the wine
  12. quality (score between 0 and 10)

Univariate Plots Section

In this section I will be conducting some preliminary exploration of data

  1. Fixed Acidity
summary(df_wine$fixed.acidity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90
ggplot(aes(fixed.acidity), data = df_wine)+
  geom_histogram()+
  xlab("Fixed Acidity")+
  ylab("Count")+
  ggtitle("Histogram of Fixed Acidity and Count")

  1. Volatile Acidity
summary(df_wine$volatile.acidity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800
ggplot(aes(volatile.acidity), data = df_wine)+
  geom_histogram()+
  scale_x_continuous(lim = c(0,1.4))+
  xlab("Volatile Aciity")+
  ylab("Count")+
  ggtitle("Histogram of Volatile Acidity and Count")

  1. Fixed Acidity
summary(df_wine$citric.acid)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000
ggplot(aes(citric.acid), data = df_wine)+
  geom_histogram()+
  scale_x_continuous(lim = c(0,0.80))+
  xlab("Citric Acid")+
  ylab("Count")+
  ggtitle("Histogram of Citric Acid and Count")

  1. Residual Sugar
summary(df_wine$residual.sugar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500
ggplot(aes(residual.sugar), data = df_wine)+
  geom_histogram()+
  scale_x_continuous(lim = c(0,8))+
  xlab("Residual Sugar")+
  ylab("Count")+
  ggtitle("Histogram of Residual Sugar and Count")

  1. Chlorides
summary(df_wine$chlorides)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
ggplot(aes(chlorides), data = df_wine)+
  geom_histogram()+
  scale_x_continuous(lim = c(0,0.3))+
  xlab("Chlorides")+
  ylab("Count")+
  ggtitle("Histogram of Chlorides and Count")

  1. Free Sulfur Dioxide
summary(df_wine$free.sulfur.dioxide)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00
ggplot(aes(free.sulfur.dioxide), data = df_wine)+
  geom_histogram()+
  xlab("Free Sulfur Dioxide")+
  ylab("Count")+
  ggtitle("Histogram of Free Sulfur Dioxide and Count")

  1. Total Sulfur Dioxide
summary(df_wine$total.sulfur.dioxide)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00
ggplot(aes(total.sulfur.dioxide), data = df_wine)+
  geom_histogram()+
  scale_x_continuous(lim = c(0,175))+
  xlab("Total Sulfur Dioxide")+
  ylab("Count")+
  ggtitle("Histogram of Total Sulfur Dioxide and Count")

  1. Density
summary(df_wine$density)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0040
ggplot(aes(density), data = df_wine)+
  geom_histogram()+
  xlab("Density")+
  ylab("Count")+
  ggtitle("Histogram of Density and Count")

  1. pH
summary(df_wine$ph)
## Length  Class   Mode 
##      0   NULL   NULL
ggplot(aes(pH), data = df_wine)+
  geom_histogram()+
  xlab("pH")+
  ylab("Count")+
  ggtitle("Histogram of pH and Count")

  1. Sulphates
summary(df_wine$sulphates)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000
ggplot(aes(sulphates), data = df_wine)+
  geom_histogram()+
  scale_x_continuous(lim = c(0,1.5))+
  xlab("Sulphates")+
  ylab("Count")+
  ggtitle("Histogram of Sulphates and Count")

  1. Alcohol
summary(df_wine$alcohol)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90
ggplot(aes(alcohol), data = df_wine)+
  geom_histogram()+
  xlab("Alcohol")+
  ylab("Count")+
  ggtitle("Histogram of Alcohol and Count")

  1. Quality
summary(df_wine$quality)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.636   6.000   8.000
ggplot(df_wine, aes(x=factor(quality))) + geom_bar() +
  xlab("Quality")+
  ylab("Count")+
  ggtitle("Histogram of Quality and Count")

Univariate Analysis

What is the structure of your dataset?

There are 1599 observations in the dataset, also there are 12 variable(excludiong x).

What is/are the main feature(s) of interest in your dataset?

My main feature of interest in this dataset is to explore how quality is influenced by other factors

What other features in the dataset do you think will help support your into your feature(s) of interest?

All features in this dataset may help support my investigation

Other Important observations

  1. Volatile acidity ,pH and Density has Normal Distribution Graph
  2. Most of the wines in the dataset has quality 5 and 6 and partly in 7
  3. There are many outliers in Residual sugar and Chloride distributions
  4. Fixed acidity, Citric acid, Free sulfur dioxide, Total sulfur dioxide, Sulphates, Alcohol has positively skewed graph

Creating new variable named ratings for the further analysis. The scale of the rating is as follows: Rating Bad (0-4), Rating Average (5-7), Rating Good(8-10)

#Converting the Quality  from an Integer to a Factor
df_wine$quality <- factor(df_wine$quality, ordered = T)
#Creating a new Factored Variable called 'Ratings'
df_wine$ratings <- ifelse(df_wine$quality <= 4, 'bad', ifelse(
  df_wine$quality <= 7, 'average', 'good'))
#Ordering
df_wine$ratings <- ordered(df_wine$ratings, levels = c('bad', 'average', 'good'))
summary(df_wine$ratings)
##     bad average    good 
##      63    1518      18
ggplot(aes(x = ratings), data = df_wine)+
  geom_bar()

We can observe most of the wine are in average rating range

Bivariate Plots Section

Effect of Fixed Acidity

ggplot(aes(ratings, fixed.acidity), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Fixed Acidity") +
  ggtitle("Fixed acidity v/s ratings")

ggplot(aes(quality, fixed.acidity), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Fixed Acidity") +
  ggtitle("Fixed acidity v/s quality")

Observations: I dont see any pattern of fixed acidity affecting the quality of the wine. May be from the observation we can say fixed acidity may not have any influence in quality of the wine

Effect of Volatile acidity

ggplot(aes(ratings, volatile.acidity), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Volatile Acidity") +
  ggtitle("Volatile acidity v/s ratings")

ggplot(aes(quality, volatile.acidity), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Volatile Acidity") +
  ggtitle("Volatile acidity v/s quality")

Observations Upon investigating graph, we can clearly observe that, the less the volatile acidity, the more is the quality of the wine. So to be clear, an ideal wine should have less volatile acidity.

Effect of Citric Acid

ggplot(aes(ratings, citric.acid), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Citric Acid") +
  ggtitle("Citric Acid v/s ratings")

ggplot(aes(quality, citric.acid), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Citric Acid") +
  ggtitle("Citric Acid v/s quality")

Observations: Upon investigation, we can find that citric acid has a positive impact on wine quality and hence we can say more the citric acid concentration, the better is its quality

Effect of Residual Sugar

ggplot(aes(ratings, residual.sugar), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  scale_y_continuous(lim = c(0.5,4)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Residual Sugar") +
  ggtitle("Residual Sugar v/s ratings")

ggplot(aes(quality, residual.sugar), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  scale_y_continuous(lim = c(0.5,4)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Residual Sugar") +
  ggtitle("Residual Sugar v/s quality")

Observations The Residual sugar impact on quality is not clear in the above observation, hence cannot come to a cunclusion. Further more here in the above plot, i have removed outliers for better plot quality

Effect of Chlorides

ggplot(aes(ratings, chlorides), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  scale_y_continuous(lim = c(0,.2)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Chlorides") +
  ggtitle("Chlorides v/s ratings")

ggplot(aes(quality, chlorides), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  scale_y_continuous(lim = c(0,.2)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Chlorides") +
  ggtitle("Chlorides v/s quality")

Observations The Chloride impact on quality is not fully clear in the above observation but can say that less the chloride concentration, the better may be the quality of wine. Further, i have excluded some outliers for better plot quality

Effect of Free Sulfur Dioxide

ggplot(aes(ratings, free.sulfur.dioxide), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  scale_y_continuous(lim = c(0,45)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Free Sulfur Dioxide") +
  ggtitle("Free Sulfur Dioxide v/s ratings")

ggplot(aes(quality, free.sulfur.dioxide), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .3) +
  scale_y_continuous(lim = c(0,45)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Free Sulfur Dioxide") +
  ggtitle("Free Sulfur Dioxide v/s quality")

Observations: The Free Sulfur dioxide impact on quality is not clear in the above observation, hence cannot come to a cunclusion. Further more here in the above plot, i have removed outliers for better plot quality

Effect of Total Sulfur Dioxide

ggplot(aes(ratings, total.sulfur.dioxide), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  scale_y_continuous(lim = c(0,150)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Total Sulfur Dioxide") +
  ggtitle("Total Sulfur Dioxide v/s ratings")

ggplot(aes(quality, total.sulfur.dioxide), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  scale_y_continuous(lim = c(0,150)) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Total Sulfur Dioxide") +
  ggtitle("Total Sulfur Dioxide v/s quality")

Observations: Although Total Sulfur dioxide impact on quality is not clear , we can say that for good quality the total sulfer dioxide may be higher than 30, but cannot come to cunclusion as the plot doesnot reveal any patterns. Further more here in the above plot, i have removed outliers for better plot quality

Effect of Density

ggplot(aes(quality, density), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Density") +
  ggtitle("Density v/s quality")

Observations: The less the density of wine, the more will be the quality of wine

Effect of pH

ggplot(aes(ratings, pH), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("pH") +
  ggtitle("pH v/s ratings")

ggplot(aes(quality, pH), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("pH") +
  ggtitle("pH v/s quality")

Observations: The less the pH, the more the quality of wine

Effect of Sulphates

ggplot(aes(ratings, sulphates), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Sulphates") +
  ggtitle("Sulphates v/s ratings")

ggplot(aes(quality, sulphates), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Sulphates") +
  ggtitle("Sulphates v/s quality")

Observations: The more the suphate concentration, the quality of wine increses

Effect of Alcohol

ggplot(aes(ratings, alcohol), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Alcohol") +
  ggtitle("Alcohol v/s ratings")

ggplot(aes(quality, alcohol), data = df_wine) + geom_boxplot(alpha = .5) +
  geom_jitter( alpha = .2) +
  stat_summary(fun.y = "mean", geom = "point",color = "blue", shape = 8,size = 5)+
  ylab("Alcohol") +
  ggtitle("Alcohol v/s quality")

Observations The more the alcohol present, the more is the quality of wine

Bivariate Analysis

Observations

  1. Positive Impact on quality ==> Citric acid, Sulphates, Alcohol
  2. Negetive Imapct on quality ==> Volatile acidity, Chlorides, pH
  3. No Impact on quality / No clear observation ==> Fixed Acidity, Residual sugar, Free Sulfur dioxide, Total Sulfur dioxide,

Multivariate Plots Section

ggplot(aes(y = density, x = alcohol,color = quality),data = df_wine) +
  geom_point() +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("Density")+
  ggtitle("Alcohol and Density with respect quality")

ggplot(aes(y = density, x = alcohol,color = quality),data = df_wine) +
  facet_wrap(~ratings)+
  geom_point() +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("Density")+
  ggtitle("Alcohol and Density with respect quality")

Observations: No clear observations the density affects quality when alcohol is kept constant

ggplot(aes(y = sulphates, x = alcohol,color = quality), data = df_wine) +
  geom_point() +
  scale_y_continuous(limits=c(0.3,1.5)) +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("Sulphates")+
  ggtitle("Alcohol and Sulphates with respect quality")

ggplot(aes(y = sulphates, x = alcohol,color = quality), data = df_wine) +
  geom_point() +
  scale_y_continuous(limits=c(0.3,1.5)) +
  facet_wrap(~ratings) +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("Sulphates")+
  ggtitle("Alcohol and Sulphates with respect quality")

Observations : It seems that, the higher alcoholic the wine gets, the more sulphate it contains, Also it has positive impact on quality

ggplot(aes(y = pH, x = alcohol,color = quality),data = df_wine)+
  geom_point() +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("pH")+
  ggtitle("Alcohol and pH with respect quality")

ggplot(aes(y = pH, x = alcohol,color = quality),data = df_wine)+
  geom_point() +
  facet_wrap(~ratings) +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("pH")+
  ggtitle("Alcohol and pH with respect quality")

Observations: It can be observed that low ph concentrations and high alcoholic contain makes quality wine

ggplot(aes(y = volatile.acidity, x = alcohol,color = quality),data = df_wine) +
  geom_point() +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("Volatile Acidity")+
  ggtitle("Alcohol and Volatile acidity with respect quality")

ggplot(aes(y = volatile.acidity, x = alcohol,color = quality),data = df_wine) +
  geom_point() +
  facet_wrap(~ratings) +
  scale_color_brewer()+
  theme_dark()+
  xlab("Alcohol")+
  ylab("Volatile Acidity")+
  ggtitle("Alcohol and Volatile acidity with respect quality")

Observations: It can be observed that, lower the volatile acidity, and higher alcholic concentration makes quality wine

Multivariate Analysis

Interesting observations:

  1. More Sulphate and High Alcoholic wine makes better quality wine

  2. Low pH and high alcoholic wine makes better quality wine

  3. Lower the volatile acidity, and higher alcholic concentration makes quality wine

OPTIONAL: Did you create any models with your dataset? Discuss the  strengths and limitations of your model.

No, As i am not confertable with ML.


Final Plots and Summary

Alcohol V/s Quality

This plot shows that higher quality wine has more alcoholic content

Sulphates v/s quality

This plot shows the higher the sulphate contents, the more quality the wine becomes

Alcohol and Sulphates with respect quality

This plot shows clear picture on how sulphates and alcholic content influences the quality of wine positively


Reflection

This dataset provided information on Red wine collected by the company Vhino verde. I started exploring data and found some interesting observation. I started exploring data by first importing the required libraries and by impoting the data which was in “.csv” format. I did some Univariate analysis to understand each attributes in the dataset. Bivariate analysis gave a lot of information about the data and also it helped me to find insights on how the quality of wine is affected by various factors. I wanted to know more about how 2 or more factors affect quality of wine, hence performed multivariate analysis which results of it are explained above. The future work on the project that can be done include building Supervised learning models that helps company to manufacture quality wines.